-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
speed up /members and add at= and membership params #3568
Conversation
as per the proposal; we can deduplicate redundant lazy-loaded members which are sent in the same sync sequence. we do this heuristically rather than requiring the client to somehow tell us which members it has chosen to cache, by instead caching the last N members sent to a client, and not sending them again. For now we hardcode N to 100. Each cache for a given (user,device) tuple is in turn cached for up to X minutes (to avoid the caches building up). For now we hardcode X to 30.
@@ -89,6 +89,69 @@ def _get_current_state_ids_txn(txn): | |||
_get_current_state_ids_txn, | |||
) | |||
|
|||
# FIXME: how should this be cached? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understanding the caching stuff to know how best to handle this - or whether this method should be combined with get_current_state_ids above, and somehow share the same caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you've got three choices:
-
treat each call to
get_filtered_current_state_ids
as a separate cacheable thing - so for example if you do a lookup ofget_filtered_current_state_ids(room, None, None)
, a lookup ofget_filtered_current_state_ids(room, [(t, k)], None)
would not hit the cache. To do this, you just need an @cached decorator, plus invalidation calls. -
Implement this in terms of a call to
get_current_state_ids
followed by a filter. This might increase cache hits, but cache misses will of course be very expensive. -
use a DictionaryCache, like
_state_group_cache
uses, sinceDictionaryCache
has support for partial caches. The keys would be room ids rather than state group ids. again you'd need to consider invalidation. -
Not bother with caching and see how much it hurts?
this feels much clunkier and more complicated for both clients and senders than just querying based on event_id, and i'm failing to see any way it's more correct than querying based on event_id. also fixes a thinko to check whether the user is allowed to view membership as of the given token
The conclusions of the API shape were roughly:
However, as requested out-of-band, I've switched /members?at to take a stream token rather than an event ID. I am failing to justify this to myself as I can't see any bug introduced by using event IDs, and meanwhile the implementation gets more complicated both serverside & clientside by thinking in terms of stream tokens instead. I've done it anyway to try to get this through review before getting blocked any further. matrix-org/sytest#469 has also been updated. @richvdh PTAL. |
one question which using event ids opens is: are you talking about the state before or after that event? I think you want the state after the event, but nothing else that takes an event_id does that. Indeed, I don't think "after event X" is powerful enough on its own to define a particular point in the room state. Consider a room with a diamond in the DAG:
B and C are both state events. Does the state after B change once C arrives? You can solve it by knowing that B's stream ordering is lower than C's, so that a client referring to the "state after B" will not have seen the state at C - but then you are just using the event id as a token in the sync stream, for which we use sync tokens. (A server could just use event ids as its stream tokens. But we don't mandate that, because it leaves more options for server implementations). |
synapse/handlers/message.py
Outdated
@defer.inlineCallbacks | ||
def get_state_events( | ||
self, user_id, room_id, types=None, filtered_types=None, | ||
at_token=None, is_guest=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: trailing comma would be nice :/
synapse/handlers/message.py
Outdated
at_token.room_key | ||
).stream | ||
|
||
# XXX: is this the right method to be using? What id we don't yet have an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's not; as you note, there is a problem if we don't yet have an event after the token, which is moderately likely given my understanding of the proposed usecase (cf https://docs.google.com/document/d/11yn-mAkYll10RJpN0mkYEVqraTbU3U4eQx9MNrzqX1U/edit?disco=AAAACCLJnz0: please can we have an answer there?)
I'd suggest doing the same as SyncHandler.get_state_at
(which is basically to call get_recent_events_for_room
and then apply the change implied by the event in question - though as the comments there say, that is buggy too :/)
synapse/handlers/message.py
Outdated
room_state = yield self.store.get_state_for_events( | ||
[membership_event_id], None | ||
# check we are even allowed to be reading the room at this point | ||
event = yield self.store.get_event(event_id, allow_none=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why allow_none? it's only going to make it explode in a more subtle way
synapse/handlers/message.py
Outdated
event = yield self.store.get_event(event_id, allow_none=True) | ||
visible_events = yield filter_events_for_client(self.store, user_id, [event]) | ||
|
||
if len(visible_events) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just if visible_events
is more pythonic here
synapse/handlers/message.py
Outdated
) | ||
room_state = room_state[event.event_id] | ||
else: | ||
room_state = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the logic for returning an empty result rather than a 403?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only that get_state_events() felt like a pretty generic method, and raising a 403 might break stuff upstream. in practice it's only used by /state and /members so i think it's fine - have switched to throw 404 if the token isn't recognised and 403 if the event at that token is filtered out wrt our user.
@richvdh thanks for review - ptal. |
synapse/handlers/message.py
Outdated
room_state = yield self.store.get_state_for_events( | ||
[membership_event_id], None | ||
if not last_events: | ||
raise SynapseError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use NotFoundError here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done in 0d5770d)
synapse/handlers/message.py
Outdated
@@ -82,14 +82,49 @@ def get_room_data(self, user_id=None, room_id=None, | |||
defer.returnValue(data) | |||
|
|||
@defer.inlineCallbacks | |||
def get_state_events(self, user_id, room_id, is_guest=False): | |||
def _check_in_room_or_world_readable(self, room_id, user_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in f5189b9
I'm going to squash-merge this |
Features -------- - Add support for the SNI extension to federation TLS connections. Thanks to @vojeroen! ([\#3439](#3439)) - Add /_media/r0/config ([\#3184](#3184)) - speed up /members API and add `at` and `membership` params as per MSC1227 ([\#3568](#3568)) - implement `summary` block in /sync response as per MSC688 ([\#3574](#3574)) - Add lazy-loading support to /messages as per MSC1227 ([\#3589](#3589)) - Add ability to limit number of monthly active users on the server ([\#3633](#3633)) - Support more federation endpoints on workers ([\#3653](#3653)) - Basic support for room versioning ([\#3654](#3654)) - Ability to disable client/server Synapse via conf toggle ([\#3655](#3655)) - Ability to whitelist specific threepids against monthly active user limiting ([\#3662](#3662)) - Add some metrics for the appservice and federation event sending loops ([\#3664](#3664)) - Where server is disabled, block ability for locked out users to read new messages ([\#3670](#3670)) - set admin uri via config, to be used in error messages where the user should contact the administrator ([\#3687](#3687)) - Synapse's presence functionality can now be disabled with the "use_presence" configuration option. ([\#3694](#3694)) - For resource limit blocked users, prevent writing into rooms ([\#3708](#3708)) Bugfixes -------- - Fix occasional glitches in the synapse_event_persisted_position metric ([\#3658](#3658)) - Fix bug on deleting 3pid when using identity servers that don't support unbind API ([\#3661](#3661)) - Make the tests pass on Twisted < 18.7.0 ([\#3676](#3676)) - Don’t ship recaptcha_ajax.js, use it directly from Google ([\#3677](#3677)) - Fixes test_reap_monthly_active_users so it passes under postgres ([\#3681](#3681)) - Fix mau blocking calulation bug on login ([\#3689](#3689)) - Fix missing yield in synapse.storage.monthly_active_users.initialise_reserved_users ([\#3692](#3692)) - Improve HTTP request logging to include all requests ([\#3700](#3700)) - Avoid timing out requests while we are streaming back the response ([\#3701](#3701)) - Support more federation endpoints on workers ([\#3705](#3705), [\#3713](#3713)) - Fix "Starting db txn 'get_all_updated_receipts' from sentinel context" warning ([\#3710](#3710)) - Fix bug where `state_cache` cache factor ignored environment variables ([\#3719](#3719)) - Fix bug in v0.33.3rc1 which caused infinite loops and OOMs ([\#3723](#3723)) - Fix bug introduced in v0.33.3rc1 which made the ToS give a 500 error ([\#3732](#3732)) Deprecations and Removals ------------------------- - The Shared-Secret registration method of the legacy v1/register REST endpoint has been removed. For a replacement, please see [the admin/register API documentation](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/register_api.rst). ([\#3703](#3703)) Internal Changes ---------------- - The test suite now can run under PostgreSQL. ([\#3423](#3423)) - Refactor HTTP replication endpoints to reduce code duplication ([\#3632](#3632)) - Tests now correctly execute on Python 3. ([\#3647](#3647)) - Sytests can now be run inside a Docker container. ([\#3660](#3660)) - Port over enough to Python 3 to allow the sytests to start. ([\#3668](#3668)) - Update docker base image from alpine 3.7 to 3.8. ([\#3669](#3669)) - Rename synapse.util.async to synapse.util.async_helpers to mitigate async becoming a keyword on Python 3.7. ([\#3678](#3678)) - Synapse's tests are now formatted with the black autoformatter. ([\#3679](#3679)) - Implemented a new testing base class to reduce test boilerplate. ([\#3684](#3684)) - Rename MAU prometheus metrics ([\#3690](#3690)) - add new error type ResourceLimit ([\#3707](#3707)) - Logcontexts for replication command handlers ([\#3709](#3709)) - Update admin register API documentation to reference a real user ID. ([\#3712](#3712))
builds on #3567 (in turn #3331 (in turn #2970)).(all now merged)